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Abstract — This paper extends the "single crossing point" prop- 
erty of the scalar MMSE function, derived by Guo, Shamai and 
Verdii (first presented in ISIT 2008), to the parallel degraded 
MIMO scenario. It is shown that the matrix Q(t), which is 
the difference between the MMSE assuming a Gaussian input 
and the MMSE assuming an arbitrary input, has, at most, a 
single crossing point for each of its eigenvalues. Together with 
the I-MMSE relationship, a fundamental connection between 
Information Theory and Estimation Theory, this new property 
is employed to derive results in Information Theory. As a simple 
application of this property we provide an alternative converse 
proof for the broadcast channel (BC) capacity region under 
covariance constraint in this specific setting. 

I. Introduction 

A fundamental relationship between estimation theory and 
information theory for Gaussian channels was presented in 
[1]; in particular, it was shown that for the MIMO standard 
Gaussian channel, 

Y = ^/smHX + N (1) 

where N is a standard Gaussian n-dimensional random vector 
and H is a fixed channel matrix known to the receiver, 
then regardless of the input distribution on X, the mutual 
information and the minimum mean-square error (MMSE) are 
related (assuming real-valued inputs/outputs) by 



^smHX + N) 



HX - HK{X\y/smHX + N} \\ 2 }. 



Here E{X|F} stands for the conditional mean of X given Y . 
This fundamental relationship and its generalizations [1], [2], 
referred to as the I-MMSE relationships, have already been 
shown to be useful in several aspects of information theory: 
providing insightful proofs for entropy power inequalities [3], 
revealing the mercury/waterfilling optimal power allocation 
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over a set of parallel Gaussian channels [4] and recently gener- 
alizing this result to MIMO Gaussian channels in [5], tackling 
the weighted sum-MSE maximization in MIMO broadcast 
channels [6], illuminating extrinsic information of good codes 
[7], and enabling a simple proof of the mono tonicity of the 
non-Gaussianness of independent random variables [8]. In 
[9], [10] and later in [11] it has been shown that using this 
relationship one can provide insightful and simple proofs for 
multi-user single antenna problems such as the BC, the secrecy 
capacity problem, and the multi-receiver secrecy capacity 
region. In [12] this approach has been extended to the MIMO 
Gaussian wiretap channel yielding a closed form expression 
for the secrecy capacity. In order to provide the converse proof 
of the BC capacity region in [9], [10], the authors proved an 
inherent property of the MMSE, the "single crossing point" 
property: as a function of snr, the MMSE of the Gaussian input 
distribution and the MMSE of an arbitrary input distribution 
intersect at most once. This property is stronger than required 
in order to prove the BC capacity region, however it is an 
interesting property on its own. 

Motivated by this approach, our goal is to examine the prop- 
erties of the MMSE matrix in the MIMO scenario, and relate 
these properties to the mutual information. As an initial model 
we have chosen the following simplified parallel channel, 



Y = HX + N 



(2) 



where X, Y and N are n-dimensional random vectors, and 
N is standard Gaussian. H is assumed diagonal and positive 
semidefinite. Note that X is not necessarily composed of 
independent components, in which case the "single crossing 
point" property can be deduced from the "single crossing 
point" property of the individual components. 

As pointed out earlier, in the scalar channel scenario we 
have seen that a "single crossing point" between the MMSE 
of the Gaussian input and the MMSE of an arbitrary input dis- 
tribution exists as a function of snr. The current more complex 
scenario, in which we have diagonal channel matrices, raises 
two questions: What scalar function of the MMSE matrix 
should we examine for an analogous property to the "single 
crossing point"? And, along what n x rt-dimensional path 
should we look (since in our parallel MIMO scenario there are 
multiple n x n-dimensional paths between every two channel 



matrices)? For consistency, even before answering these two 
questions, we would like to emphasize that the path will be 
parameterized through the scalar parameter t, thus instead of 
H we will write R(t), to avoid confusion. 

The paper is organized as follows: section II contains the 
most basic definitions used in this work. Section III details 
our choice of path and gives some preliminary results used to 
prove our primary result. Section IV contains our main result, 
which is an extension of the single crossing point property to 
the parallel degraded MIMO scenario. Section V connects the 
result of the previous section to the mutual information using 
the I-MMSE relationship. Finally, section VI demonstrates 
how we can use this property to provide an alternative converse 
proof for the BC capacity region under covariance constraint 
in the parallel degraded MIMO setting. 

II. Definitions 
We now formally give the definition of the MMSE matrix: 

E(t) = E{(X -E{X\R(t)X + N}) 



(X — E{X\R(t)X + N}) T } 



(3) 



where R(t) corresponds to the channel matrix H. The param- 
eter t determines the channel matrix, thus the new variable 
R(t) comes to highlight the dependence of the channel on the 
parameter t. 

In this work we use the following I-MMSE relationship 
derived by Palomar and Verdu in [2]: 



V H I{X- HX + N)= HE 



(4) 



where H is, a. fixed known channel matrix, and TV is a 
standard Gaussian additive noise. Rewriting this relationship 
as an integration along a path R(t) from t = to t' results 
with the following expression: 



I(X-Y(t'j) = I(X-R{t')X + N) 

= £ ti({R(t)E(t)f R'{tj)dt 



(5) 



where R'(t) = 

In this work we specifically examine the properties of the 
difference between the MMSE resulting from an arbitrary 
input distribution and a Gaussian input distribution (not neces- 
sarily having the same covariance matrix). As such, we require 
the following definition: 



Q(t) 



E G (t) - E(t) 



(6) 



where we have denoted Ec{t) the MMSE matrix assuming a 
general Gaussian input distribution. 

III. Preliminaries 

We begin this section by presenting our choice of path, that 
is, we provide an answer to the second question presented in 
the Introduction: Along what n x n-dimensional path should 
we look? In the scalar "single crossing point" property [9], 
[10], the MMSE is given as a function of snr, a non-negative 
value. The change in MMSE is examined as snr monotonically 



increases. When switching to the MIMO scenario, our choice 
was to mimic the properties of the scalar scenario, that is, 
we show that there exists a non-negative, monotonically non- 
decreasing path between any two diagonal matrices H\ and 
H 2 , such that ^ if i ^ H 2 , as given in the following 
lemma. 

Lemma 1. For any two diagonal matrices H\ and H 2 , 
such that < H\ < H 2 , there exists a non-negative, 
monotonically non-decreasing path R{t) for all t G [0, 1] such 
that the following holds: 



R(t = 0) = 

R(h) = H 1 

and R(t 2 = 1) = H 2 

where < t\ < t 2 = 1. 



(7) 



Proof: We need to define a function, gi(t), for each 
diagonal element i. It suffices to choose any non-negative 
function fi{t) such that the area from to t\ will equal [H{\n 
and the area from t\ to t 2 will equal [H 2 ]u — [H{\n. Given 
that, we can set the function to be gi{t) — fi(t')dt' . The 
entire path, R(t), will be given by: 



R{t) =diag{0i(i), ...,«/„(*)}. 



(8) 



As required, this path passes between the zero matrix at t = 0, 
Hi at ti and H 2 at t 2 = 1. Since fi(t) are chosen non- 
negative for all i we have a non-negative and monotonically 
non-decreasing path for all t £ [0, 1], ■ 
We now turn to provide some preliminary results that will 
be shown central in the sequel. 

Lemma 2 ([13, Ch. 4, Sec. 11]). Let \ and indicate the 
i-th eigenvalue (assumed of multiplicity 1) of the matrix Z and 
its corresponding eigenvector, respectively. Then, it follows 
that 



D Z A 4 = uf 



(9) 



where D is the Jacobian operator, whose definition can be 
found in [13]. 

Corollary 1. If the matrix Z depends on a real scalar 
parameter t, i.e., Z = Z(t), then, assuming \ is an 
eigenvalue of multiplicity 1, applying the chain rule, we get: 

^ = D z(t) A < D t Z(t) 



dr 



(10) 

= (uf ®uf)vec(Z'(r)) (11) 
- uf Z'(r) Ui (12) 

where the last equality follows from [13, Ch. 2, Th. 2.2]. 

Corollary 2. If, given a t — r , the matrix Z(tq) is diagonal, 
we can always take [ui]j = Sij and, thus, the result in 
Corollary 1 particularizes to 

dXi(Z(T)) 



dr 



d[Z(r) 



dr 



(13) 



Remark 1. Observe that the results in Lemma 2 and Corollary 
1 are valid only for the case where the multiplicity of \ is 
equal to 1. However, as explained in [13, Ch. 8, Sec. 12] and 
formally stated in [13, Ch. 8, Sec. 12, Th. 13], in our case, 
the result in Corollary 2 can be applied directly to the case 
where the multiplicity of Xi is greater than 1. 

IV. Single Crossing Point for each Eigenvalue 

Before stating our primary result we require a lower bound 
on the matrix Q(t), defined in (6), which is given in the 
following lemma. 

Lemma 3. The following lower bound holds: 

Q'(t) h 2 (E(t)B{t)E T {t) - E G {t)B{t)E T G {t)) (14) 

where B(t) = R(t)R'(t) is a diagonal matrix. 

Proof: We first provide the derivative of the MMSE with 
respect to the parameter t. Using the chain rule given in [14, 
equations (65-66)], 

D t Eij(t) = DnEij (t)DtR(t) 

i T \ 



= tr 



dR(t) 



R'(t). 



(15) 



Since R(t) is diagonal, the last expression can be further 
simplified to 



DtEijit) 



E 



(16) 



^ 9Ru(t) 

Using the result ([14, eq. (131)]), 

D Rll(t) E l3 (t) = -E{ < P x (Y) jl [<P x (Y)R(t) T ] u 
+<P X (Y) U [cf) x (Y)R(t) T ] jl } 

= -n<t>x{Y)^x{Y)uR{t)u 
+ ( j> x (Y) il <j> x (Y) jl R(t) ll } 

= -2Rum{<f>x(Y)ji<l>x(Y)ii}QT) 

where 

<P x (y) = E{(X - E{X\y})(X - E{X\y}) T \y}. (18) 

Note that (f> x (y) depends on t through Y(t) = R(t)X + N. 
The second equality in equation (17) is due to the fact that 
R(t) is diagonal. Thus, we can write the derivative of Eij(t) 
as 

D t E t3 {t) = -2Y,Ru{m{<t>x{Y)ji4>x{Y)ti}R' u {t) 
i 

= -2Y,Rii{t)R'u{m{<l>x{Y)ji<l>x{Y)ii} 
i 

= -2Y J Bu{m{<t>x{Y) 3 Kt>x{Y)a} (19) 
i 

recalling that Bu{t) = Ru(t)R' u (t). We can put this expres- 
sion into a matrix form as follows: 



D t E(t) = -2Y,B u {m{4>x{Y)i<t>x{Y)J} 



(20) 



where <j> x (Y)i is the I th column of the matrix <p x (Y). Using 
the fact that for a Gaussian input distribution <p x (Y) does not 
depend on Y and, thus, <j> x (Y) = E{cf) x (Y)} = E G (t) [14], 
we can obtain the following lower bound on the derivative of 
the matrix Q(t): 

Q\t) 

= 2E B "W (n<t>x(Y)i<t>x(Y)T} Ef(Ef) T ) 
i 

h 2£B„(t) (E{<Mnm<Mn} T - #f f) 

i 

= 2j2Bu(t) (E t Ej - Ef{Ef) T ) 
i 

= 2 (E{t)B{t)E T {t) - E G {t)B{t)E T G (t)) 

where the inequality is due to Jensen. ■ 
Let us fix to > and consider the generalized eigenvalue 
decomposition [15] on (Eaito), E(to)). 1 Thus, there exists 
an invertible matrix Vq such that, 



E G (t Q ) 

E(t ) = Vo-ZoVo (21) 
where Sq is a positive semi-definite diagonal matrix. Thus, 



v T v a 



Q(t ) = E G (to) - E(t ) = Vq (I — S ) V Q (22) 

and the following matrix: 

Q(t ) = Vo T Q(t )V^ (23) 

is diagonal. By defining Co = VoB(t )VQ ', we can rewrite 
the lower bound attained in Lemma 3 as follows: 



Q'{t)±2Vl (EqCoEo - Co) V - 
Our main result is the following: 



(24) 



Theorem 1. Each eigenvalue of Q(t) crosses the horizontal 
axis at most once. 

Proof: Consider the new matrix function Q(t) = 
Vo T Q(t)Vo 1 ' wn i cn > from Sylvester's law of inertia, has 
the same number of positive, negative, and zero eigenvalues 
as Q(t). 

Note that Q(t ) = 7 — So is a diagonal matrix and, thus, 
we can apply Corollary 2 to obtain a lower bound on the 
derivative of the eigenvalues of Q(t) evaluated at to: 



d\i(Q(t)) 



dt 



d[Q(t)], 



t=t 



dt 



Vo T Q'(t )Vo 1 



t=t 



>2([So], 4 [Co] [Coh) (25) 



where in (25) we applied the lower bound given in Lemma 3. 

Now, let us particularize the bound obtained in (25) to the 
non-positive eigenvalues of Q(t ), i.e., those that fulfill that 

'in the generalized eigenvalue decomposition we have considered that 
Eg y 0. A sufficient condition for Eq >~ is that the covariance of 
the Gaussian input distribution is non-singular and that R(t) is non-singular. 



Xi(Q(t)) < which implies that [Sq]u > 1, from which it 
follows that 



dAi(Q(t)) 



dt 



Proof: Using, 

K{B{t)Q{t)} = \i{Bi(t)Q(t)Bi(t)} 



(28) 



> 2([S ]ii[Co] [C ]«)>0 (26) 



where we have used the fact that B{t) y 0. 

The last result implies that, in a sufficiently small neigh- 
borhood of to, the non-positive eigenvalues of Q(t) are non- 
decreasing functions of t. Consequently, from the continuity 
of the eigenvalues, the number of negative eigenvalues of Q(t) 
cannot increase. 

Now, taking into account that the number of positive, zero, 
and negative eigenvalues is preserved under the transformation 
Q(t) i y Q(t) we will informally show that a zero eigenvalue 
of Q(t) cannot become negative (a complete formal proof 
is given in [16]). This will prove that each eigenvalue of 
Q(t) crosses the horizontal axis at most once. We will prove 
this result by contradiction. Let's assume that there is a zero 
eigenvalue of Q(to) that becomes negative for t > t - Since 
the number of negative eigenvalues cannot increase, there must 
be at least one negative eigenvalue at t that increases to zero. 
If we examine the sign of the eigenvalues at i + A for a 
sufficiently small A, we know that the zero eigenvalue has to 
be negative, however the negative eigenvalue (for sufficiently 
small A) is also still negative. Thus, we will have an increase 
in the number of negative eigenvalues, contradicting the prop- 
erty of no increase in negative eigenvalues of Q(t). This shows 
that a zero eigenvalue cannot become negative in Q(to). ■ 

The following corollary is a simple consequence from 
Theorem 1. 

Corollary 1. If, for a given t', the function Q(t') fulfils that 
Q(t') y then for all t > t' we also have that Q(t) y 0. 

Note that, by restricting the input distributions to be Ltd., 
the matrix Q(t) is a diagonal matrix for all t. Thus, the 
single crossing property of the eigenvalues simplifies to a 
single crossing property of the diagonal values, as expected 
due to the scalar single crossing property [9], [10]. However, 
in the general case, where the input distribution is arbitrary, the 
multivariate unique crossing property does not follow directly 
from the scalar case. 

V. Connecting to the Mutual Information 

As in the scalar scenario, our goal is to use the "single 
crossings" in order to derive results regarding the mutual 
information. According to the I-MMSE relationship (5), we 
would like to examine the following function: 

ti{B(t)E G (t)} - ti{B(t)E(t)} = tr{B(t)Q(t)}. (27) 

Since the trace is the sum of the eigenvalues, we need 
the following lemma, that extends our results regarding the 
eigenvalues of Q(t) to the eigenvalues of B{t)Q(t) for a 
positive semi-definite diagonal matrix B(t). 

Lemma 4. Each eigenvalue of B(t)Q(t) crosses the horizon- 
tal axis at most once. 



with the fact that B(t) is diagonal and positive semi-definite, 
we again have the eigenvalues of a congruent transformation. 
Thus, the extension of the previous claim follows directly. ■ 
In order to use our results regarding the function Q(t) we 
need the following lemma: 

Lemma 5. For any t' , there exists a Gaussian input covari- 
ance matrix C G such that the following holds: 

1) C G < C x 

2) I(X;Y(t'))=I(X G ;Y G (t')) 

3) Q(t') y 

where Y{t') = R(t')X + N and Y G (t') = R(t')X G + N. 

Proof: Due to the space limitations we will give only a 
sketch of the proof. For full details see [16]. From the third 
requirement we have: 



Q(t') = E G (t') - E{t') = J y 0. 

Furthermore, we can define: 

c = E L (t')-E(t r ) yo 



(29) 



(30) 



where E^t') is the error covariance matrix assuming an 
optimal linear estimator. When J = {E G (t') = E(t')) we 
have that Q{t') = 0. According to Theorem 1 we have that 
all eigenvalues are non-positive for t < t'. Furthermore, due 
to Lemma 4 we conclude that the eigenvalues of B(t)Q(t) 
are also non-positive for all t < t' and we can conclude, using 
(5), that I(X G ;Y G (t')) < I{X;Y{t')). If J = C we have 
that C G = Cx in which case we have I(X G ;Y G (t')) > 
I(X;Y(t')). In order to comply with the first requirement 
we need to require that J < C. Thus, requirements 1 and 3 
can be written using J and C, defined in equations (29) and 
(30) respectively, and we have the following: 

I(X G ; Y G {t')) < I(X; Y{t')) < I(X G ; Y G (t')) 

j=o J=C 
The question is whether there exists such a J that will 
also attain I(X G ;Y G {t')) = I{X;Y(t / )) = a. Both up- 
per and lower bound can be expressed using the function 
r(t) = ^log | B ^^| which is continuous and monotonically 
decreasing in v for < v < 1 for A y 0, B y and A y 
[17]. Thus, there exists a < v* < 1 that attains equality. 
That is, J* = (1 — v*)C attains all three requirement. ■ 

VI. Application: parallel degraded MIMO 
Gaussian BC 

As an example for the usage of these results we examine the 
parallel degraded Gaussian BC. We first note that the results 
attained so far have also been extended to the conditioned case 
where (X, U) are jointly distributed and U — X — Y forms a 
Markov chain, but are omitted here due to space limitations. 



The conditioned MMSE is denned as: 

E u (t) = E{(X -E{X\R(t)X + N,U}) 

(X -E{X\R(t)X + N,U}) T } (31) 

and the conditioned matrix Q u (t) = E G (t) - E u (t), which 
is the difference between the MMSE matrix assuming a 
general Gaussian input distribution independent of U, and the 
conditioned MMSE matrix. 

We consider the degraded parallel Gaussian BC channel: 



Yi[m] = HiX[m] + Ni[m] 
Y 2 [m] = H 2 X[m}+N 2 [m] 



(32) 



where TVi [m] and N 2 [m] are standard additive Gaussian noise 
vectors, Hi and H 2 are diagonal positive definite matrices 
such that H\ ■< H 2 . The channel input satisfies the covariance 
constraint: E{XX T } ^ S, where S is some positive definite 
matrix. 

One way of proving that the Gaussian input achieves the 
capacity region is by using the single-letter expression [18]: 



Ri < I{U;Yi) 
R 2 < I(X-Y 2 \U) 



(33) 



where U is an auxiliary random variable over a certain 
alphabet that satisfies the Markov relation U — X (Y±, Y 2 ). 
This was done for the scalar Gaussian BC in [9], [10]. We will 
try to follow similar steps for the degraded parallel Gaussian 
channel. 

Assume a pair (X, U) with covariance matrix Cx- Using 
the conditioned version of Lemma 5 we know that there exists 
a Gaussian distribution, with covariance B -< S, with the 
following properties: 

I(X;HiX + N\U) = I(X G ;HiX G + N) (34) 

I(X G ; H 2 X G + N) — I(X: H 2 X + N\U) 

= f 2 tr{B(t)Q u {t)}dt 
Jt=o 

1 tr{B(t)Q u (t)}dt+ { \{B{t)Q u {t)}dt (35) 
/t=o Jti 

= + / ti{B(t)Q u (t)}dt > (36) 
Jti 

where (35) is due to (34), and the inequality is due to the 
fact that Q u (t) h for all t > h and Lemma 4. Thus, we 
have a Gaussian distribution that complies with a covariance 
constraint and also, 

I(X; HiX + N\U) = I{X G ; HiX G + N) 

I(X; H 2 X + N\U) < I(Xq; H 2 X g + N) (37) 

assuming a parallel degraded model, that is, -< Hi -< H 2 . 
Now, substituting the above into the region given in equation 
(33) we obtain the following region: 

Ri < I(U;Y 1 ) = I(X;Y 1 )-I(X;Y 1 \U) 



< l|og|J + HiSHl | - ^\og\I + HiBHl\ 

= W I + HlSH l\ (38) 
2 \I + HiBHl\ 



R 2 < I(X;Y 2 \U) < -\og\I + H 2 BH q 2 



(39) 



This concludes the converse part of the proof. The achievabil- 
ity is well-known using Gaussian superposition coding. 
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